Carga y limpieza preliminar de los datos

Los datos que se van a analizar en este documento, proceden de la compilación hecha por usuarios de Kaggle. La fecha del análisis empieza el 6 de Abril de 2020, utilizando la versión número 73 recopilada en la web anterior.

import pandas as pd
datos = pd.read_csv("covid_19_clean_complete.csv")
datos.head(10)
##                  Province/State       Country/Region  ...  Deaths  Recovered
## 0                           NaN          Afghanistan  ...       0          0
## 1                           NaN              Albania  ...       0          0
## 2                           NaN              Algeria  ...       0          0
## 3                           NaN              Andorra  ...       0          0
## 4                           NaN               Angola  ...       0          0
## 5                           NaN  Antigua and Barbuda  ...       0          0
## 6                           NaN            Argentina  ...       0          0
## 7                           NaN              Armenia  ...       0          0
## 8  Australian Capital Territory            Australia  ...       0          0
## 9               New South Wales            Australia  ...       0          0
## 
## [10 rows x 8 columns]
pd <- import("pandas")
datos <- pd$read_csv("covid_19_clean_complete.csv")
kable(head(datos, 10))
Province/State Country/Region Lat Long Date Confirmed Deaths Recovered
NaN Afghanistan 33.0000 65.0000 1/22/20 0 0 0
NaN Albania 41.1533 20.1683 1/22/20 0 0 0
NaN Algeria 28.0339 1.6596 1/22/20 0 0 0
NaN Andorra 42.5063 1.5218 1/22/20 0 0 0
NaN Angola -11.2027 17.8739 1/22/20 0 0 0
NaN Antigua and Barbuda 17.0608 -61.7964 1/22/20 0 0 0
NaN Argentina -38.4161 -63.6167 1/22/20 0 0 0
NaN Armenia 40.0691 45.0382 1/22/20 0 0 0
Australian Capital Territory Australia -35.4735 149.0124 1/22/20 0 0 0
New South Wales Australia -33.8688 151.2093 1/22/20 0 0 0
datos <- read.csv("covid_19_clean_complete.csv", stringsAsFactors = FALSE)
datos %>% head(10) %>% kable()
Province.State Country.Region Lat Long Date Confirmed Deaths Recovered
Afghanistan 33.0000 65.0000 1/22/20 0 0 0
Albania 41.1533 20.1683 1/22/20 0 0 0
Algeria 28.0339 1.6596 1/22/20 0 0 0
Andorra 42.5063 1.5218 1/22/20 0 0 0
Angola -11.2027 17.8739 1/22/20 0 0 0
Antigua and Barbuda 17.0608 -61.7964 1/22/20 0 0 0
Argentina -38.4161 -63.6167 1/22/20 0 0 0
Armenia 40.0691 45.0382 1/22/20 0 0 0
Australian Capital Territory Australia -35.4735 149.0124 1/22/20 0 0 0
New South Wales Australia -33.8688 151.2093 1/22/20 0 0 0

Estructura de los datos

str(datos)
## 'data.frame':    21484 obs. of  8 variables:
##  $ Province.State: chr  "" "" "" "" ...
##  $ Country.Region: chr  "Afghanistan" "Albania" "Algeria" "Andorra" ...
##  $ Lat           : num  33 41.2 28 42.5 -11.2 ...
##  $ Long          : num  65 20.17 1.66 1.52 17.87 ...
##  $ Date          : chr  "1/22/20" "1/22/20" "1/22/20" "1/22/20" ...
##  $ Confirmed     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Deaths        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Recovered     : int  0 0 0 0 0 0 0 0 0 0 ...
colnames(datos) = c("Provincia_Estado",
                    "Pais_Region",
                    "Latitud", # N+ o S-
                    "Longitud", # E+ o W-
                    "Fecha",
                    "Casos_Confirmados",
                    "Casos_Muertos",
                    "Casos_Recuperados"
                    )
datos %>% head() %>% kable() # %>% kable_styling()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados
Afghanistan 33.0000 65.0000 1/22/20 0 0 0
Albania 41.1533 20.1683 1/22/20 0 0 0
Algeria 28.0339 1.6596 1/22/20 0 0 0
Andorra 42.5063 1.5218 1/22/20 0 0 0
Angola -11.2027 17.8739 1/22/20 0 0 0
Antigua and Barbuda 17.0608 -61.7964 1/22/20 0 0 0
  • Cualitativas se convierten con factor o bien as.factor.
  • Ordinales se convierten con ordered.
  • Cuantitativos se convierten con as.numeric.
datos$Provincia_Estado %<>% factor()
datos$Pais_Region %<>% factor()
#datos$Fecha %<>% as.Date(format="%m/%d/%y")
datos$Fecha %<>% mdy()
str(datos)
## 'data.frame':    21484 obs. of  8 variables:
##  $ Provincia_Estado : Factor w/ 81 levels "","Alberta","Anguilla",..: 1 1 1 1 1 1 1 1 6 49 ...
##  $ Pais_Region      : Factor w/ 185 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 9 ...
##  $ Latitud          : num  33 41.2 28 42.5 -11.2 ...
##  $ Longitud         : num  65 20.17 1.66 1.52 17.87 ...
##  $ Fecha            : Date, format: "2020-01-22" "2020-01-22" ...
##  $ Casos_Confirmados: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Casos_Muertos    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Casos_Recuperados: int  0 0 0 0 0 0 0 0 0 0 ...

\[Confirmados = Muertos + Recuperados + Enfermos\]

datos %<>%
  mutate(Casos_Enfermos = Casos_Confirmados - Casos_Muertos - Casos_Recuperados)

datos %>%
  filter(Casos_Confirmados > 10000) %>%
  head(10) %>%
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Enfermos
Hubei China 30.9756 112.2707 2020-02-02 11177 350 295 10532
Hubei China 30.9756 112.2707 2020-02-03 13522 414 386 12722
Hubei China 30.9756 112.2707 2020-02-04 16678 479 522 15677
Hubei China 30.9756 112.2707 2020-02-05 19665 549 633 18483
Hubei China 30.9756 112.2707 2020-02-06 22112 618 817 20677
Hubei China 30.9756 112.2707 2020-02-07 24953 699 1115 23139
Hubei China 30.9756 112.2707 2020-02-08 27100 780 1439 24881
Hubei China 30.9756 112.2707 2020-02-09 29631 871 1795 26965
Hubei China 30.9756 112.2707 2020-02-10 31728 974 2222 28532
Hubei China 30.9756 112.2707 2020-02-11 33366 1068 2639 29659
datos %>% 
  filter(Casos_Enfermos < 0) %>%
  arrange(Provincia_Estado, Fecha) %>%
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Enfermos
Diamond Princess Canada 0.0000 0.0000 2020-03-22 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-03-23 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-03-24 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-03-25 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-03-26 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-03-27 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-03-28 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-03-29 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-03-30 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-03-31 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-04-01 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-04-02 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-04-03 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-04-04 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-04-05 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-04-06 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-04-07 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-04-08 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-04-09 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-04-10 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-04-11 0 1 0 -1
Diamond Princess Canada 0.0000 0.0000 2020-04-12 -1 1 0 -2
Hainan China 19.1959 109.7453 2020-03-24 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-25 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-26 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-27 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-28 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-29 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-30 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-31 168 6 168 -6
Hainan China 19.1959 109.7453 2020-04-01 168 6 168 -6
datos %>%
  filter(Provincia_Estado == "Hainan") %>%
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Enfermos
Hainan China 19.1959 109.7453 2020-01-22 4 0 0 4
Hainan China 19.1959 109.7453 2020-01-23 5 0 0 5
Hainan China 19.1959 109.7453 2020-01-24 8 0 0 8
Hainan China 19.1959 109.7453 2020-01-25 19 0 0 19
Hainan China 19.1959 109.7453 2020-01-26 22 0 0 22
Hainan China 19.1959 109.7453 2020-01-27 33 1 0 32
Hainan China 19.1959 109.7453 2020-01-28 40 1 0 39
Hainan China 19.1959 109.7453 2020-01-29 43 1 0 42
Hainan China 19.1959 109.7453 2020-01-30 46 1 1 44
Hainan China 19.1959 109.7453 2020-01-31 52 1 1 50
Hainan China 19.1959 109.7453 2020-02-01 62 1 1 60
Hainan China 19.1959 109.7453 2020-02-02 64 1 4 59
Hainan China 19.1959 109.7453 2020-02-03 72 1 4 67
Hainan China 19.1959 109.7453 2020-02-04 80 1 5 74
Hainan China 19.1959 109.7453 2020-02-05 99 1 5 93
Hainan China 19.1959 109.7453 2020-02-06 106 1 8 97
Hainan China 19.1959 109.7453 2020-02-07 117 2 10 105
Hainan China 19.1959 109.7453 2020-02-08 124 2 14 108
Hainan China 19.1959 109.7453 2020-02-09 131 3 19 109
Hainan China 19.1959 109.7453 2020-02-10 138 3 19 116
Hainan China 19.1959 109.7453 2020-02-11 144 3 20 121
Hainan China 19.1959 109.7453 2020-02-12 157 4 27 126
Hainan China 19.1959 109.7453 2020-02-13 157 4 30 123
Hainan China 19.1959 109.7453 2020-02-14 159 4 43 112
Hainan China 19.1959 109.7453 2020-02-15 162 4 39 119
Hainan China 19.1959 109.7453 2020-02-16 162 4 52 106
Hainan China 19.1959 109.7453 2020-02-17 163 4 59 100
Hainan China 19.1959 109.7453 2020-02-18 163 4 79 80
Hainan China 19.1959 109.7453 2020-02-19 168 4 84 80
Hainan China 19.1959 109.7453 2020-02-20 168 4 86 78
Hainan China 19.1959 109.7453 2020-02-21 168 4 95 69
Hainan China 19.1959 109.7453 2020-02-22 168 4 104 60
Hainan China 19.1959 109.7453 2020-02-23 168 5 106 57
Hainan China 19.1959 109.7453 2020-02-24 168 5 116 47
Hainan China 19.1959 109.7453 2020-02-25 168 5 124 39
Hainan China 19.1959 109.7453 2020-02-26 168 5 129 34
Hainan China 19.1959 109.7453 2020-02-27 168 5 131 32
Hainan China 19.1959 109.7453 2020-02-28 168 5 133 30
Hainan China 19.1959 109.7453 2020-02-29 168 5 148 15
Hainan China 19.1959 109.7453 2020-03-01 168 5 149 14
Hainan China 19.1959 109.7453 2020-03-02 168 5 151 12
Hainan China 19.1959 109.7453 2020-03-03 168 5 155 8
Hainan China 19.1959 109.7453 2020-03-04 168 5 158 5
Hainan China 19.1959 109.7453 2020-03-05 168 6 158 4
Hainan China 19.1959 109.7453 2020-03-06 168 6 158 4
Hainan China 19.1959 109.7453 2020-03-07 168 6 158 4
Hainan China 19.1959 109.7453 2020-03-08 168 6 159 3
Hainan China 19.1959 109.7453 2020-03-09 168 6 159 3
Hainan China 19.1959 109.7453 2020-03-10 168 6 159 3
Hainan China 19.1959 109.7453 2020-03-11 168 6 159 3
Hainan China 19.1959 109.7453 2020-03-12 168 6 160 2
Hainan China 19.1959 109.7453 2020-03-13 168 6 160 2
Hainan China 19.1959 109.7453 2020-03-14 168 6 160 2
Hainan China 19.1959 109.7453 2020-03-15 168 6 160 2
Hainan China 19.1959 109.7453 2020-03-16 168 6 161 1
Hainan China 19.1959 109.7453 2020-03-17 168 6 161 1
Hainan China 19.1959 109.7453 2020-03-18 168 6 161 1
Hainan China 19.1959 109.7453 2020-03-19 168 6 161 1
Hainan China 19.1959 109.7453 2020-03-20 168 6 161 1
Hainan China 19.1959 109.7453 2020-03-21 168 6 161 1
Hainan China 19.1959 109.7453 2020-03-22 168 6 161 1
Hainan China 19.1959 109.7453 2020-03-23 168 6 161 1
Hainan China 19.1959 109.7453 2020-03-24 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-25 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-26 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-27 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-28 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-29 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-30 168 6 168 -6
Hainan China 19.1959 109.7453 2020-03-31 168 6 168 -6
Hainan China 19.1959 109.7453 2020-04-01 168 6 168 -6
Hainan China 19.1959 109.7453 2020-04-02 168 6 162 0
Hainan China 19.1959 109.7453 2020-04-03 168 6 162 0
Hainan China 19.1959 109.7453 2020-04-04 168 6 162 0
Hainan China 19.1959 109.7453 2020-04-05 168 6 162 0
Hainan China 19.1959 109.7453 2020-04-06 168 6 162 0
Hainan China 19.1959 109.7453 2020-04-07 168 6 162 0
Hainan China 19.1959 109.7453 2020-04-08 168 6 162 0
Hainan China 19.1959 109.7453 2020-04-09 168 6 162 0
Hainan China 19.1959 109.7453 2020-04-10 168 6 162 0
Hainan China 19.1959 109.7453 2020-04-11 168 6 162 0
Hainan China 19.1959 109.7453 2020-04-12 168 6 162 0
datos %>%
  filter(Provincia_Estado == "Hainan", Casos_Enfermos < 0) %>%
  mutate(Casos_Recuperados = Casos_Recuperados + Casos_Enfermos,
         Casos_Enfermos = 0) %>%
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Enfermos
Hainan China 19.1959 109.7453 2020-03-24 168 6 162 0
Hainan China 19.1959 109.7453 2020-03-25 168 6 162 0
Hainan China 19.1959 109.7453 2020-03-26 168 6 162 0
Hainan China 19.1959 109.7453 2020-03-27 168 6 162 0
Hainan China 19.1959 109.7453 2020-03-28 168 6 162 0
Hainan China 19.1959 109.7453 2020-03-29 168 6 162 0
Hainan China 19.1959 109.7453 2020-03-30 168 6 162 0
Hainan China 19.1959 109.7453 2020-03-31 168 6 162 0
Hainan China 19.1959 109.7453 2020-04-01 168 6 162 0

Análisis geográfico

#datos_europa = datos[datos$Latitud > 38 & datos$Longitud > -25 & datos$Longitud < 30 , ]

datos_europa = datos %>%
  filter(Latitud > 38, between(Longitud, -25, 30))

nrow(datos_europa)
## [1] 3690
table(datos_europa$Pais_Region) %>%
  as.data.frame() %>%
  filter(Freq > 0) %>%
  kable()
Var1 Freq
Albania 82
Andorra 82
Austria 82
Belarus 82
Belgium 82
Bosnia and Herzegovina 82
Bulgaria 82
Croatia 82
Czechia 82
Denmark 164
Estonia 82
Finland 82
France 82
Germany 82
Greece 82
Holy See 82
Hungary 82
Iceland 82
Ireland 82
Italy 82
Kosovo 82
Latvia 82
Liechtenstein 82
Lithuania 82
Luxembourg 82
Moldova 82
Monaco 82
Montenegro 82
Netherlands 82
North Macedonia 82
Norway 82
Poland 82
Portugal 82
Romania 82
San Marino 82
Serbia 82
Slovakia 82
Slovenia 82
Spain 82
Sweden 82
Switzerland 82
United Kingdom 246
datos_europa %>%
  filter(Fecha == ymd("2020-03-15")) %>%
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Enfermos
Albania 41.15330 20.16830 2020-03-15 42 1 0 41
Andorra 42.50630 1.52180 2020-03-15 1 0 1 0
Austria 47.51620 14.55010 2020-03-15 860 1 6 853
Belarus 53.70980 27.95340 2020-03-15 27 0 3 24
Belgium 50.83330 4.00000 2020-03-15 886 4 1 881
Bosnia and Herzegovina 43.91590 17.67910 2020-03-15 24 0 0 24
Bulgaria 42.73390 25.48580 2020-03-15 51 2 0 49
Croatia 45.10000 15.20000 2020-03-15 49 0 1 48
Czechia 49.81750 15.47300 2020-03-15 253 0 0 253
Faroe Islands Denmark 61.89260 -6.91180 2020-03-15 11 0 0 11
Denmark 56.26390 9.50180 2020-03-15 864 2 1 861
Estonia 58.59530 25.01360 2020-03-15 171 0 1 170
Finland 64.00000 26.00000 2020-03-15 244 0 10 234
France 46.22760 2.21370 2020-03-15 4499 91 12 4396
Germany 51.00000 9.00000 2020-03-15 5795 11 46 5738
Greece 39.07420 21.82430 2020-03-15 331 4 8 319
Holy See 41.90290 12.45340 2020-03-15 1 0 0 1
Hungary 47.16250 19.50330 2020-03-15 32 1 1 30
Iceland 64.96310 -19.02080 2020-03-15 171 5 8 158
Ireland 53.14240 -7.69210 2020-03-15 129 2 0 127
Italy 43.00000 12.00000 2020-03-15 24747 1809 2335 20603
Latvia 56.87960 24.60320 2020-03-15 30 0 1 29
Liechtenstein 47.14000 9.55000 2020-03-15 4 0 0 4
Lithuania 55.16940 23.88130 2020-03-15 12 0 1 11
Luxembourg 49.81530 6.12960 2020-03-15 59 1 0 58
Moldova 47.41160 28.36990 2020-03-15 23 0 0 23
Monaco 43.73330 7.41670 2020-03-15 2 0 0 2
Montenegro 42.50000 19.30000 2020-03-15 0 0 0 0
Netherlands 52.13260 5.29130 2020-03-15 1135 20 2 1113
North Macedonia 41.60860 21.74530 2020-03-15 14 0 1 13
Norway 60.47200 8.46890 2020-03-15 1221 3 1 1217
Poland 51.91940 19.14510 2020-03-15 119 3 0 116
Portugal 39.39990 -8.22450 2020-03-15 245 0 2 243
Romania 45.94320 24.96680 2020-03-15 131 0 9 122
San Marino 43.94240 12.45780 2020-03-15 101 5 4 92
Serbia 44.01650 21.00590 2020-03-15 48 0 0 48
Slovakia 48.66900 19.69900 2020-03-15 54 0 0 54
Slovenia 46.15120 14.99550 2020-03-15 219 1 0 218
Spain 40.00000 -4.00000 2020-03-15 7798 289 517 6992
Sweden 63.00000 16.00000 2020-03-15 1022 3 1 1018
Switzerland 46.81820 8.22750 2020-03-15 2200 14 4 2182
Channel Islands United Kingdom 49.37230 -2.36440 2020-03-15 3 0 0 3
Isle of Man United Kingdom 54.23610 -4.54810 2020-03-15 0 0 0 0
United Kingdom 55.37810 -3.43600 2020-03-15 1140 21 18 1101
Kosovo 42.60264 20.90298 2020-03-15 0 0 0 0

\[d(x, y) = \sqrt{(x_{Lat}-y_{Lat})^2 + (x_{Long}-y_{Long})^2}\]

distancia_grados = function(x, y){
  sqrt((x[1]-y[1])^2 + (x[2]-y[2])^2)
}

distancia_grados_potsdam = function(x){
  potsdam = c(52.366956, 13.906734)
  distancia_grados(x, potsdam)
}

dist_potsdam = apply(cbind(datos_europa$Latitud, datos_europa$Longitud),
                     MARGIN = 1, 
                     FUN = distancia_grados_potsdam)

datos_europa %<>%
  mutate(dist_potsdam = dist_potsdam)

datos_europa %>%
  filter(between(Fecha, dmy("2-3-2020"), dmy("7-3-2020")),
         dist_potsdam < 4) %>%
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Enfermos dist_potsdam
Czechia 49.8175 15.473 2020-03-02 3 0 0 3 2.992142
Czechia 49.8175 15.473 2020-03-03 5 0 0 5 2.992142
Czechia 49.8175 15.473 2020-03-04 8 0 0 8 2.992142
Czechia 49.8175 15.473 2020-03-05 12 0 0 12 2.992142
Czechia 49.8175 15.473 2020-03-06 18 0 0 18 2.992142
Czechia 49.8175 15.473 2020-03-07 19 0 0 19 2.992142
world <- ne_countries(scale = "medium", returnclass = "sf")

datos$Pais_Region = factor(datos$Pais_Region, levels = c(levels(datos$Pais_Region), "United States"))

datos[datos$Pais_Region=="US",]$Pais_Region = "United States"

world %>%
  inner_join(datos, by = c("name" = "Pais_Region")) %>%
  filter(Fecha == dmy("30-03-2020")) %>%
  ggplot() +
  geom_sf(color = "black", aes(fill = Casos_Confirmados)) +
#  coord_sf(crs="+proj=laea +lat_0=50 +lon_0=10 +units=m +ellps=GRS80") +
  scale_fill_viridis_c(option="plasma", trans = "sqrt") +
  xlab("Longitud") + ylab("Latitud") +
  ggtitle("Mapa del mundo ", subtitle = "COVID 19") -> g
## Warning: Column `name`/`Pais_Region` joining character vector and factor,
## coercing into character vector
ggplotly(g)
datos %>%
  filter(Fecha == dmy("30-03-2020")) %>%
  ggplot(aes(Longitud, Latitud)) +
  geom_point(aes(size = log(Casos_Confirmados+1), colour = log(Casos_Muertos+1))) +
  coord_fixed() +
  theme(legend.position = "bottom") -> g

ggplotly(g)
thr = 1000

datos %>%
  filter(Fecha == ymd("2020-04-05"),
         Casos_Confirmados > thr) %>%
  mutate(Prop_Muertos = Casos_Muertos / Casos_Confirmados, 
         Ranking = dense_rank(desc(Prop_Muertos))) %>%
  arrange(Ranking) %>%
  head(20) %>%
  kable()
Provincia_Estado Pais_Region Latitud Longitud Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Enfermos Prop_Muertos Ranking
Italy 43.0000 12.0000 2020-04-05 128948 15887 21815 91246 0.1232047 1
Algeria 28.0339 1.6596 2020-04-05 1320 152 90 1078 0.1151515 2
United Kingdom 55.3781 -3.4360 2020-04-05 47806 4934 135 42737 0.1032088 3
Netherlands 52.1326 5.2913 2020-04-05 17851 1766 250 15835 0.0989300 4
Spain 40.0000 -4.0000 2020-04-05 131646 12641 38080 80925 0.0960227 5
Indonesia -0.7893 113.9213 2020-04-05 2273 198 164 1911 0.0871095 6
France 46.2276 2.2137 2020-04-05 92839 8078 16183 68578 0.0870108 7
Belgium 50.8333 4.0000 2020-04-05 19691 1447 3751 14493 0.0734853 8
Morocco 31.7917 -7.0926 2020-04-05 1021 70 76 875 0.0685602 9
Egypt 26.0000 30.0000 2020-04-05 1173 78 247 848 0.0664962 10
Iran 32.0000 53.0000 2020-04-05 58226 3603 19736 34887 0.0618796 11
Sweden 63.0000 16.0000 2020-04-05 6830 401 205 6224 0.0587116 12
Ecuador -1.8312 -78.1834 2020-04-05 3646 180 100 3366 0.0493692 13
Hubei China 30.9756 112.2707 2020-04-05 67803 3210 63945 648 0.0473430 14
Dominican Republic 18.7357 -70.1627 2020-04-05 1745 82 17 1646 0.0469914 15
Philippines 13.0000 122.0000 2020-04-05 3246 152 64 3030 0.0468269 16
Brazil -14.2350 -51.9253 2020-04-05 11130 486 127 10517 0.0436658 17
Greece 39.0742 21.8243 2020-04-05 1735 73 78 1584 0.0420749 18
Mexico 23.6345 -102.5528 2020-04-05 1890 79 633 1178 0.0417989 19
Denmark 56.2639 9.5018 2020-04-05 4369 179 1327 2863 0.0409705 20
datos$lat_class = cut(datos$Latitud, 
                      breaks = seq(from = -90, to = 90, by = 10))
datos$long_class = cut(datos$Longitud, 
                       breaks = seq(from = -180, to = 180, by = 10))
tt = table(datos$lat_class, datos$long_class)
tt = tt[nrow(tt):1, ]
mosaicplot(t(tt), shade = TRUE)

Análisis de datos temporal

datos_por_fecha = aggregate(
  cbind(Casos_Confirmados, Casos_Muertos, Casos_Recuperados) ~ Fecha,
  data = datos, 
  FUN = sum
)
datos_por_fecha$Casos_Enfermos = datos_por_fecha$Casos_Confirmados - datos_por_fecha$Casos_Muertos - datos_por_fecha$Casos_Recuperados
head(datos_por_fecha)
##        Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Enfermos
## 1 2020-01-22               555            17                28            510
## 2 2020-01-23               654            18                30            606
## 3 2020-01-24               941            26                36            879
## 4 2020-01-25              1434            42                39           1353
## 5 2020-01-26              2118            56                52           2010
## 6 2020-01-27              2927            82                61           2784
tail(datos_por_fecha)
##         Fecha Casos_Confirmados Casos_Muertos Casos_Recuperados Casos_Enfermos
## 77 2020-04-07           1426094         81865            296259        1047970
## 78 2020-04-08           1511102         88338            324502        1098262
## 79 2020-04-09           1595348         95455            348808        1151085
## 80 2020-04-10           1691717        102525            370234        1218958
## 81 2020-04-11           1771512        108503            395513        1267496
## 82 2020-04-12           1846676        114091            414591        1317994
barplot(Casos_Confirmados ~ Fecha, data = datos_por_fecha)

plot(Casos_Confirmados ~ Fecha, data = datos_por_fecha, col = "blue", type = "l", main = "Casos documentados por día en todo el mundo", xlab = "Fecha", ylab = "Número de personas", log = "y")
lines(Casos_Muertos ~ Fecha, data = datos_por_fecha, col = "red")
lines(Casos_Recuperados ~ Fecha, data = datos_por_fecha, col = "green")

legend("topleft", c("Confirmados", "Muertos", "Recuperados"), 
       col = c("blue", "red", "green"), pch = 1, lwd = 2)

datos_por_fecha_ts <- xts(x = datos_por_fecha[, 2:5],
                          order.by = datos_por_fecha$Fecha)
dygraph(datos_por_fecha_ts) %>%
  dyOptions(labelsUTC = TRUE, labelsKMB = TRUE,
            fillGraph = TRUE, fillAlpha = 0.05, 
            drawGrid = FALSE, colors = "#D9AE55") %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2,
              hideOnMouseOut = FALSE) %>%
  dyRoller(rollPeriod = 2)
datos_spain = datos %>% 
  filter(Pais_Region == "Spain") %>%
  select(Fecha, starts_with("Casos_"))

plot(x = datos_spain$Fecha, y = datos_spain$Casos_Confirmados,
     main = "Casos confirmados en España", type = "s", 
     col = "blue", lwd = 2)

datos_por_fecha_ts <- xts(x = datos_spain[, 2:5],
                          order.by = datos_spain$Fecha)
dygraph(datos_por_fecha_ts) %>%
  dyOptions(labelsUTC = TRUE, labelsKMB = TRUE,
            fillGraph = TRUE, fillAlpha = 0.05, 
            drawGrid = FALSE, colors = "#D9AE55") %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2,
              hideOnMouseOut = FALSE) %>%
  dyRoller(rollPeriod = 2)
barplot(as.matrix(t(datos_spain[, 3:5])),
        names = datos_spain$Fecha, 
        col = c("red", "green", "yellow"),
        main = "Estudio de casos por tipo en España", 
        xlab ="Fecha", ylab = "Número de personas")
legend("topleft", c("Muertos", "Recuperados", "Enfermos"),
       col = c("red", "green", "yellow"), lwd = 2, pch = 1
       )

datos_spain %<>%
  mutate(Nuevos_Casos_Confirmados = Casos_Confirmados - lag(Casos_Confirmados, n = 1),
         Nuevos_Casos_Muertos = Casos_Muertos - lag(Casos_Muertos, n = 1),
         Nuevos_Casos_Recuperados = Casos_Recuperados - lag(Casos_Recuperados, n = 1)
         )

plot(Nuevos_Casos_Confirmados ~ Fecha, data = datos_spain,
     type = "l", col ="blue", 
     xlab = "Fecha", ylab = "Nuevos casos", 
     main = "Nuevos registros en España")
lines(Nuevos_Casos_Muertos ~ Fecha, data = datos_spain,
      type = "l", col = "red")
lines(Nuevos_Casos_Recuperados ~ Fecha, data = datos_spain,
      type = "l", col = "green")

legend("topleft", c("Confirmados", "Muertos", "Recuperados"), 
       col = c("blue", "red", "green"), 
       lwd = 2, pch = 1)

Modelos de regresión simple

  • \(x\): Variable Independiente: número de días desde el origen de la pandemia
  • \(y\): Variable Dependiente: número de casos confirmados.

\[y = f(x)\]

datos_spain$Dias = as.numeric(datos_spain$Fecha - dmy("22/01/2020"))

Regresión Lineal

\[y = ax+b, a,b\in \mathbb R\]

\[min_{a,b\in\mathbb R} \sum_{i=1}^n (y_i-(ax_i+b))^2\]

mod1 <- lm(Casos_Confirmados ~ Dias, data = datos_spain)
summary(mod1)
## 
## Call:
## lm(formula = Casos_Confirmados ~ Dias, data = datos_spain)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -42810 -28328  -2758  24213  70288 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -37906.8     7005.7  -5.411 6.37e-07 ***
## Dias          1659.9      149.3  11.114  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32010 on 80 degrees of freedom
## Multiple R-squared:  0.6069, Adjusted R-squared:  0.602 
## F-statistic: 123.5 on 1 and 80 DF,  p-value: < 2.2e-16

\[Casos\ Confirmados = 1659.8748272 Dias + -3.7906845\times 10^{4}\]

plot(datos_spain$Dias, datos_spain$Casos_Confirmados)
abline(mod1, col = "red")

plot(mod1$residuals~mod1$fitted.values, xlab = "Valores Ajustados", ylab = "Residuos del modelo")

residuos = mod1$residuals

qqPlot(residuos, distribution = "norm", 
       mean = mean(residuos), sd = sd(residuos))

## [1] 82 81

Regresión exponencial

\[log(y) = ax+b, a,b \in \mathbb R\] \[y = e^{ax+b} = m e^{ax}\]

mod2 <- lm(log(Casos_Confirmados) ~ Dias, data = datos_spain[datos_spain$Casos_Confirmados>0, ])
summary(mod2)
## 
## Call:
## lm(formula = log(Casos_Confirmados) ~ Dias, data = datos_spain[datos_spain$Casos_Confirmados > 
##     0, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6212 -0.7500  0.1967  0.8947  1.7051 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.887489   0.307797  -12.63   <2e-16 ***
## Dias         0.218238   0.006153   35.47   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.085 on 70 degrees of freedom
## Multiple R-squared:  0.9473, Adjusted R-squared:  0.9465 
## F-statistic:  1258 on 1 and 70 DF,  p-value: < 2.2e-16

\[Casos\ Confirmados = 0.0204968 \cdot e^{0.2182377\cdot x}\]

plot(datos_spain$Dias, datos_spain$Casos_Confirmados)
lines(exp(mod2$coefficients[1])*exp(mod2$coefficients[2]*datos_spain$Dias), col = "blue")

plot(mod2$residuals ~ mod2$fitted.values, xlab = "Valores Ajustados", ylab = "Residuos del modelo")

residuos = mod2$residuals
qqPlot(residuos, distribution = "norm", 
       mean = mean(residuos), sd = sd(residuos))

## 34 33 
## 24 23

Modelo potencial

\[log(y) = a\cdot log(x)+b, a,b\in \mathbb R\] \[y = e^{a\cdot log(x)+b} = e^b\cdot e^{log(x)^a} = m\cdot x^a\]

mod3 <- lm(log(Casos_Confirmados) ~ log(Dias),
           data = datos_spain[datos_spain$Casos_Confirmados > 0, ])
summary(mod3)
## 
## Call:
## lm(formula = log(Casos_Confirmados) ~ log(Dias), data = datos_spain[datos_spain$Casos_Confirmados > 
##     0, ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8886 -1.1043  0.5966  1.0421  4.7067 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -22.6203     1.3767  -16.43   <2e-16 ***
## log(Dias)     7.7798     0.3695   21.06   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.745 on 70 degrees of freedom
## Multiple R-squared:  0.8636, Adjusted R-squared:  0.8617 
## F-statistic: 443.4 on 1 and 70 DF,  p-value: < 2.2e-16

\[Casos\ Confirmados = 1.5001585\times 10^{-10}\cdot Dias^{0.2182377}\]

plot(datos_spain$Dias, datos_spain$Casos_Confirmados)
lines(exp(mod3$coefficients[1])*datos_spain$Dias^mod3$coefficients[2], col = "green")

plot(mod3$residuals~mod3$fitted.values, 
     xlab = "Valores Ajustados", ylab = "Residuos del modelo")

residuos = mod3$residuals
qqPlot(residuos, distribution = "norm", mean = mean(residuos), sd = sd(residuos))

## 11 12 
##  1  2